dominant strategy
Supplementary Material
We provide additional results for EGT A applied to networked MARL system control for CPR management. Specifically, we investigate the consequence of different reward structures. Potential Nash equilibria are shaded in blue. NeurComm (across all values of ฮฑ), which is likely due to its consensus update mechanism. The orange ovals in these diagrams indicate which system configurations correspond to the highest expected payoff for all agents.
Humans expect rationality and cooperation from LLM opponents in strategic games
Barak, Darija, Costa-Gomes, Miguel
As Large Language Models (LLMs) integrate into our social and economic interactions, we need to deepen our understanding of how humans respond to LLMs opponents in strategic settings. We present the results of the first controlled monetarily-incentivised laboratory experiment looking at differences in human behaviour in a multi-player p-beauty contest against other humans and LLMs. We use a within-subject design in order to compare behaviour at the individual level. We show that, in this environment, human subjects choose significantly lower numbers when playing against LLMs than humans, which is mainly driven by the increased prevalence of `zero' Nash-equilibrium choices. This shift is mainly driven by subjects with high strategic reasoning ability. Subjects who play the zero Nash-equilibrium choice motivate their strategy by appealing to perceived LLM's reasoning ability and, unexpectedly, propensity towards cooperation. Our findings provide foundational insights into the multi-player human-LLM interaction in simultaneous choice games, uncover heterogeneities in both subjects' behaviour and beliefs about LLM's play when playing against them, and suggest important implications for mechanism design in mixed human-LLM systems.
Strategyproof Reinforcement Learning from Human Feedback
Buening, Thomas Kleine, Gan, Jiarui, Mandal, Debmalya, Kwiatkowska, Marta
We study Reinforcement Learning from Human Feedback (RLHF), where multiple individuals with diverse preferences provide feedback strategically to sway the final policy in their favor. We show that existing RLHF methods are not strategyproof, which can result in learning a substantially misaligned policy even when only one out of $k$ individuals reports their preferences strategically. In turn, we also find that any strategyproof RLHF algorithm must perform $k$-times worse than the optimal policy, highlighting an inherent trade-off between incentive alignment and policy alignment. We then propose a pessimistic median algorithm that, under appropriate coverage assumptions, is approximately strategyproof and converges to the optimal policy as the number of individuals and samples increases.
Paying to Do Better: Games with Payments between Learning Agents
Kolumbus, Yoav, Halpern, Joe, Tardos, รva
In repeated games, such as auctions, players typically use learning algorithms to choose their actions. The use of such autonomous learning agents has become widespread on online platforms. In this paper, we explore the impact of players incorporating monetary transfers into their agents' algorithms, aiming to incentivize behavior in their favor. Our focus is on understanding when players have incentives to make use of monetary transfers, how these payments affect learning dynamics, and what the implications are for welfare and its distribution among the players. We propose a simple game-theoretic model to capture such scenarios. Our results on general games show that in a broad class of games, players benefit from letting their learning agents make payments to other learners during the game dynamics, and that in many cases, this kind of behavior improves welfare for all players. Our results on first- and second-price auctions show that in equilibria of the ``payment policy game,'' the agents' dynamics can reach strong collusive outcomes with low revenue for the auctioneer. These results highlight a challenge for mechanism design in systems where automated learning agents can benefit from interacting with their peers outside the boundaries of the mechanism.
The coupling effect between the environment and strategies drives the emergence of group cooperation
Di, Changyan, Zhou, Qingguo, Shen, Jun, Wang, Jinqiang, Zhou, Rui, Wang, Tianyi
The coupling effect between the environment and strategies drives the emergence of group cooperation Changyan Di, Qingguo Zhou, Jun Shen, Jinqiang Wang, Rui Zhou, Tianyi Wang The coupling effect between macro environment and individual behavior is the key factor to solve the social dilemma. In a static environment, rewards of different strategies are compared simultaneously, leading to a social dilemma due to the higher payoff of defection compared to cooperation. However, when individuals are placed in a dynamic environment that is coupled with their actions, we find that the expected payoffs of different strategies are not fixed but undergo dynamic changes. The higher expected payoff of defection can be diluted over time due to environmental degradation caused by an excessive number of defectors, while cooperation may become the dominant strategy if positively reinforced by environmental feedback. Group cooperation emerges as a direct result of a mutually reinforcing positive feedback loop among the environment, immediate rewards, and individual actions (or group states). Despite the agents' lack of awareness regarding the macro-level context, they possess the ability to astutely discern the inflection point of the environment solely through their rewards. This pivotal moment prompts agents to experience a surge in immediate rewards, thereby triggering a positive feedback loop among the environment, their rewards, and their current actions. Consequently, cooperation emerges within the group.
Effect of Monetary Reward on Users' Individual Strategies Using Co-Evolutionary Learning
Ueki, Shintaro, Toriumi, Fujio, Sugawara, Toshiharu
Consumer generated media (CGM), such as social networking services rely on the voluntary activity of users to prosper, garnering the psychological rewards of feeling connected with other people through comments and reviews received online. To attract more users, some CGM have introduced monetary rewards (MR) for posting activity and quality articles and comments. However, the impact of MR on the article posting strategies of users, especially frequency and quality, has not been fully analyzed by previous studies, because they ignored the difference in the standpoint in the CGM networks, such as how many friends/followers they have, although we think that their strategies vary with their standpoints. The purpose of this study is to investigate the impact of MR on individual users by considering the differences in dominant strategies regarding user standpoints. Using the game-theoretic model for CGM, we experimentally show that a variety of realistic dominant strategies are evolved depending on user standpoints in the CGM network, using multiple-world genetic algorithm.
Equilibrium and Learning in Fixed-Price Data Markets with Externality
We propose modeling real-world data markets, where sellers post fixed prices and buyers are free to purchase from any set of sellers, as a simultaneous-move game between the buyers. A key component of this model is the negative externality buyers induce on one another due to purchasing data with a competitive advantage, a phenomenon exacerbated by data's easy replicability. We consider two settings. In the simpler complete-information setting, where all buyers know their valuations, we characterize both the existence and welfare properties of the pure-strategy Nash equilibrium in the presence of buyer externality. While this picture is bleak without any market intervention, reinforcing the limitations of current data markets, we prove that for a standard class of externality functions, market intervention in the form of a transaction cost can lead to a pure-strategy equilibrium with strong welfare guarantees. We next consider a more general setting where buyers start with unknown valuations and learn them over time through repeated data purchases. Our intervention is feasible in this regime as well, and we provide a learning algorithm for buyers in this online scenario that under some natural assumptions, achieves low regret with respect to both individual and cumulative utility metrics. Lastly, we analyze the promise and shortfalls of this intervention under a much richer model of externality. Our work paves the way for investigating simple interventions for existing data markets to address their shortcoming and the unique challenges put forth by data products.
Coopetition Against an Amazon
Gradwohl, Ronen (a:1:{s:5:"en_US";s:16:"Ariel University";}) | Tennenholtz, Moshe (Technion)
This paper analyzes cooperative data-sharing between competitors vying to predict a consumer's tastes. We design optimal data-sharing schemes both for when they compete only with each other, and for when they additionally compete with an Amazon โ a company with more, better data. We show that simple schemes โ threshold rules that probabilistically induce either full data-sharing between competitors, or the full transfer of data from one competitor to another โ are either optimal or approximately optimal, depending on properties of the information structure. We also provide conditions under which firms share more data when they face stronger outside competition, and describe situations in which this conclusion is reversed.
How and Why to Manipulate Your Own Agent
This paper deals with the following common type of scenario: several users engage in some strategic online interaction, where each of them is assisted by a learning agent. A typical example is advertisers that compete for advertising slots over some platform. Typically, each of these advertisers enters his key parameters into some advertiser-facing website, and then this website's "agent" participates on the advertiser's behalf in a sequence of auctions for ad slots. Often, the platform designer provides this agent as its advertiser-facing user interface. In cases where the platform's agent does not optimize sufficiently well for the advertiser (but rather, say, for the auctioneer), one would expect some other company to provide a better (for the advertiser) agent.